53 research outputs found
Semi-proximal Mirror-Prox for Nonsmooth Composite Minimization
We propose a new first-order optimisation algorithm to solve high-dimensional
non-smooth composite minimisation problems. Typical examples of such problems
have an objective that decomposes into a non-smooth empirical risk part and a
non-smooth regularisation penalty. The proposed algorithm, called Semi-Proximal
Mirror-Prox, leverages the Fenchel-type representation of one part of the
objective while handling the other part of the objective via linear
minimization over the domain. The algorithm stands in contrast with more
classical proximal gradient algorithms with smoothing, which require the
computation of proximal operators at each iteration and can therefore be
impractical for high-dimensional problems. We establish the theoretical
convergence rate of Semi-Proximal Mirror-Prox, which exhibits the optimal
complexity bounds, i.e. , for the number of calls to linear
minimization oracle. We present promising experimental results showing the
interest of the approach in comparison to competing methods
Sample Complexity of Sample Average Approximation for Conditional Stochastic Optimization
In this paper, we study a class of stochastic optimization problems, referred
to as the \emph{Conditional Stochastic Optimization} (CSO), in the form of
\min_{x \in \mathcal{X}}
\EE_{\xi}f_\xi\Big({\EE_{\eta|\xi}[g_\eta(x,\xi)]}\Big), which finds a wide
spectrum of applications including portfolio selection, reinforcement learning,
robust learning, causal inference and so on. Assuming availability of samples
from the distribution \PP(\xi) and samples from the conditional distribution
\PP(\eta|\xi), we establish the sample complexity of the sample average
approximation (SAA) for CSO, under a variety of structural assumptions, such as
Lipschitz continuity, smoothness, and error bound conditions. We show that the
total sample complexity improves from \cO(d/\eps^4) to \cO(d/\eps^3) when
assuming smoothness of the outer function, and further to \cO(1/\eps^2) when
the empirical function satisfies the quadratic growth condition. We also
establish the sample complexity of a modified SAA, when and are
independent. Several numerical experiments further support our theoretical
findings.
Keywords: stochastic optimization, sample average approximation, large
deviations theoryComment: Typo corrected. Reference added. Revision comments handle
Kernel Conditional Moment Constraints for Confounding Robust Inference
We study policy evaluation of offline contextual bandits subject to
unobserved confounders. Sensitivity analysis methods are commonly used to
estimate the policy value under the worst-case confounding over a given
uncertainty set. However, existing work often resorts to some coarse relaxation
of the uncertainty set for the sake of tractability, leading to overly
conservative estimation of the policy value. In this paper, we propose a
general estimator that provides a sharp lower bound of the policy value. It can
be shown that our estimator contains the recently proposed sharp estimator by
Dorn and Guo (2022) as a special case, and our method enables a novel extension
of the classical marginal sensitivity model using f-divergence. To construct
our estimator, we leverage the kernel method to obtain a tractable
approximation to the conditional moment constraints, which traditional
non-sharp estimators failed to take into account. In the theoretical analysis,
we provide a condition for the choice of the kernel which guarantees no
specification error that biases the lower bound estimation. Furthermore, we
provide consistency guarantees of policy evaluation and learning. In the
experiments with synthetic and real-world data, we demonstrate the
effectiveness of the proposed method
Robust Knowledge Transfer in Tiered Reinforcement Learning
In this paper, we study the Tiered Reinforcement Learning setting, a parallel
transfer learning framework, where the goal is to transfer knowledge from the
low-tier (source) task to the high-tier (target) task to reduce the exploration
risk of the latter while solving the two tasks in parallel. Unlike previous
work, we do not assume the low-tier and high-tier tasks share the same dynamics
or reward functions, and focus on robust knowledge transfer without prior
knowledge on the task similarity. We identify a natural and necessary condition
called the ``Optimal Value Dominance'' for our objective. Under this condition,
we propose novel online learning algorithms such that, for the high-tier task,
it can achieve constant regret on partial states depending on the task
similarity and retain near-optimal regret when the two tasks are dissimilar,
while for the low-tier task, it can keep near-optimal without making sacrifice.
Moreover, we further study the setting with multiple low-tier tasks, and
propose a novel transfer source selection mechanism, which can ensemble the
information from all low-tier tasks and allow provable benefits on a much
larger state-action space.Comment: 46 Pages; 1 Figure; NeurIPS 202
- …